migration for inactive user purge #6676

smithellis · 2025-05-23T13:14:42Z

1 - get all users who we consider inactive (3 years since login)
2 - divide into users having content and users without content
3 - hard deletes non-content users using the _base_manager
4 - pushes other users through the deletion pipeline
5 - implements batching to manage resources

escattone · 2025-05-23T17:18:19Z

kitsune/users/migrations/0035_batch_delete_inactive_users.py

+    utils_module = importlib.import_module('kitsune.users.utils')
+    delete_user_pipeline = utils_module.delete_user_pipeline


I don't see the need for this. I think it's safe to do from kitsune.users.utils import delete_user_pipeline outside of this function.

escattone · 2025-05-23T18:11:49Z

kitsune/users/migrations/0035_batch_delete_inactive_users.py

+    last_id = 0
+    while True:
+        batch = list(
+            query.filter(id__gt=last_id)
+            .order_by('id')
+            .annotate(has_content=has_content_criteria)
+            [:batch_size]
+        )
+        if not batch:
+            break
+        last_id = max(user.id for user in batch)


I think you could replace all of this and gain efficiency (fewer queries) by using the iterator method that Django provides (as we do in 0034_batch_delete_non_migrated_users.py). So something like:

current_batch = [] for user in query.annotate(has_content=has_content_criteria).iterator(chunk_size=batch_size): current_batch.append(user) if len(current_batch) >= batch_size: # Do the work ... current_batch = []

I talked myself out of iterator when I was sorting into two groups; but when I looked at your note below, I am back to being in camp iterator().

escattone · 2025-05-23T18:45:52Z

kitsune/users/migrations/0035_batch_delete_inactive_users.py

+        users_with_content = [user for user in batch if user.has_content]
+        users_no_content = [user for user in batch if not user.has_content]


Here you're iterating over 1k users twice, and then later iterating over the users_no_content to get their id's. You could do all of that in one iteration, something like:

for user in current_batch: if user.has_content: # run the pipeline else: users_no_content.append(user.id) else: User._base_manager.filter(id__in=users_no_content).delete()

This is a great catch. Thanks!

escattone

r+wc

escattone · 2025-05-27T15:21:13Z

kitsune/users/migrations/0035_batch_delete_inactive_users.py

+        # Progress reporting every 1000 users
+        if processed_count % 1000 == 0:
+            elapsed_time = time.time() - start_time
+            progress_pct = (processed_count/total_users*100) if total_users > 0 else 0


Nit. Since you've already checked if total_users is zero above (and returned in that case), you don't need to check again here, so I think you can remove if total_users > 0 else 0

escattone · 2025-05-27T15:25:51Z

kitsune/users/migrations/0035_batch_delete_inactive_users.py

+            avg_time = elapsed_time / processed_count if processed_count > 0 else 0
+            remaining_time = (total_users - processed_count) * avg_time if processed_count > 0 else 0
+            current_rate = processed_count / elapsed_time * 60 if elapsed_time > 0 else 0


Nit. Same here. I think for all of these lines, the processed_count and elapsed_time values are guaranteed to be greater than zero, so I don't think you need the if x > 0 else 0 checks.

escattone · 2025-05-27T15:45:11Z

@smithellis Forgot to mention one more thing. Just FYI, you can use migrations.RunPython.noop instead of defining a reverse function that does nothing. It's equivalent.

akatsoulas · 2025-05-28T08:24:04Z

kitsune/users/migrations/0035_batch_delete_inactive_users.py

+
+    cutoff_date = timezone.now() - timedelta(days=3*365)
+
+    query = User.objects.filter(last_login__lt=cutoff_date).annotate(has_content=has_content_criteria)


Why are you using annotate here instead of filter/exclude/Exists?

You cannot pass a Q object directly to annotate(). Not at least in version 4.2.+ that we are using. Annotation in 5.2 vs 4.2

This use case absolutely works - I think it's just not specifically called out in the 4.2 docs but is in later docs. I can run this query and see the output and it builds valid sql which executes properly.

I'm annotating here so we can later divide our users into those with and those without content, so we can execute a quicker delete process on non-content users.

akatsoulas · 2025-05-28T08:47:04Z

kitsune/users/migrations/0035_batch_delete_inactive_users.py

@@ -0,0 +1,131 @@
+from datetime import timedelta
+import time
+import importlib


is this used anywhere?

No, leftover from prior change. Weird pre-commit didn't fuss at me.

akatsoulas · 2025-05-28T08:50:37Z

kitsune/users/migrations/0035_batch_delete_inactive_users.py

+    """
+    Delete users who haven't logged in for over three years.
+    """
+    User = get_user_model()


This needs to be User = apps.get_model("auth", "User") in migrations similar to the previous one (0034)

This is necessary because the delete_user_pipeline function needs an actual User instance vs. a historical model.

migration for inactive user purge

3136a4a

smithellis mentioned this pull request May 23, 2025

Migration to bulk delete inactive users regardless of content mozilla/sumo#2345

Open

escattone reviewed May 23, 2025

View reviewed changes

performance improvements

efdf382

escattone approved these changes May 27, 2025

View reviewed changes

akatsoulas requested changes May 28, 2025

View reviewed changes

smithellis added 2 commits May 30, 2025 00:45

refactor to avoid passing Q to annotate

f555876

remove redundant zero tests

e26291f

		utils_module = importlib.import_module('kitsune.users.utils')
		delete_user_pipeline = utils_module.delete_user_pipeline

		users_with_content = [user for user in batch if user.has_content]
		users_no_content = [user for user in batch if not user.has_content]


		cutoff_date = timezone.now() - timedelta(days=3*365)

		query = User.objects.filter(last_login__lt=cutoff_date).annotate(has_content=has_content_criteria)

migration for inactive user purge #6676

Are you sure you want to change the base?

migration for inactive user purge #6676

Conversation

smithellis commented May 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

escattone left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

escattone commented May 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!